Skip to content

Conversation

@mohitmundhragithub
Copy link
Contributor

No description provided.

@mohitmundhragithub mohitmundhragithub requested review from a team and anhappdev as code owners August 26, 2025 05:41
@github-actions
Copy link

github-actions bot commented Aug 26, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

…lemented performance benchmark for LLM pipeline
…y input and issue_query only handles output tokens
@farook-edev farook-edev changed the title Feat llm LLM pipeline implementation Sep 2, 2025
@farook-edev farook-edev linked an issue Sep 2, 2025 that may be closed by this pull request
Copy link
Contributor Author

@mohitmundhragithub mohitmundhragithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@sonarqubecloud
Copy link

@farook-edev farook-edev marked this pull request as ready for review October 31, 2025 08:56
@farook-edev
Copy link
Contributor

Regarding IOS CI issue, the problem was in 2 parts:

  1. Eigen enabled exceptions regardless of fno-exceptions
  2. Tensorflow -> XNNPack -> FP16 -> math.h was incompatible with x86_64 simulator

1 was resolved by adding a patch that -for the time being- creates a macro that forces exceptions to be off. This macro is enabled only for IOS builds
NOTE: This will be unnecessary once Tensorflow is updated, since newer versions seem to drop Eigen as a dependency

2 was resolved by getting the same version of Tensorflow 2.18.0->XNNPack->FP16 and applying a patch that removes math.h as a dependency.
NOTE: THIS WILL REDUCE IOS PERFORMANCE according to some comments on github, and will be unnecessary once XNNPack is upgraded, since XNNPack reportedly dropped FP16 dependency early this year.

@freedomtan
Copy link
Contributor

@anhappdev please help check if it's safe to merge this into the master branch.

@freedomtan
Copy link
Contributor

Regarding IOS CI issue, the problem was in 2 parts:

  1. Eigen enabled exceptions regardless of fno-exceptions
  2. Tensorflow -> XNNPack -> FP16 -> math.h was incompatible with x86_64 simulator

1 was resolved by adding a patch that -for the time being- creates a macro that forces exceptions to be off. This macro is enabled only for IOS builds NOTE: This will be unnecessary once Tensorflow is updated, since newer versions seem to drop Eigen as a dependency

2 was resolved by getting the same version of Tensorflow 2.18.0->XNNPack->FP16 and applying a patch that removes math.h as a dependency. NOTE: THIS WILL REDUCE IOS PERFORMANCE according to some comments on github, and will be unnecessary once XNNPack is upgraded, since XNNPack reportedly dropped FP16 dependency early this year.

@farook-edev I’m not quite sure why there’s a compatibility issue with math.h. As far as I understand, math.h should be ANSI C and POSIX compatible, so it shouldn’t have anything to do with 16-bit floating-point operations since it existed before the fp16 standard became popular.

Anyway, could we upgrade TensorFlow? Which version should we use? @anhappdev

@freedomtan
Copy link
Contributor

@anhappdev let's create a submission-v6.0 based on this and keep upcoming updates/PRs to that branch.

@farook-edev
Copy link
Contributor

cat mlperf_log_summary.txt
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 53819307167
90th first token percentile latency (ns) : 42895741078
Result is : INVALID
  Min duration satisfied : Yes
  Min queries satisfied : Skipped
  Early stopping satisfied: NO
Recommendations:
 * The test exited early, before enough queries were issued.
   See the detailed log for why this may have occurred.
TTFT Early Stopping Result:

TPOT Early Stopping Result:
 * Only processed 8 queries.
 * Need to process at least 64 queries for early stopping.

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 0.02
QPS w/o loadgen overhead        : 0.02

Min latency (ns)                : 41576831338
Max latency (ns)                : 53819307167
Mean latency (ns)               : 45227442899
50.00 percentile latency (ns)   : 45318612899
90.00 percentile latency (ns)   : 53819307167
95.00 percentile latency (ns)   : 53819307167
97.00 percentile latency (ns)   : 53819307167
99.00 percentile latency (ns)   : 53819307167
99.90 percentile latency (ns)   : 53819307167

TPS w/ loadgen overhead         : 0.05
TPS w/o loadgen overhead        : 0.04
Min First Token latency (ns)                : 32835322331
Max First Token latency (ns)                : 42895741078
Mean First Token latency (ns)               : 36094364693
50.00 percentile first token latency (ns)   : 36311724777
90.00 percentile first token latency (ns)   : 42895741078
95.00 percentile first token latency (ns)   : 42895741078
97.00 percentile first token latency (ns)   : 42895741078
99.00 percentile first token latency (ns)   : 42895741078
99.90 percentile first token latency (ns)   : 42895741078

Min Time to Output Token (ns)                : 8396929372
Max Time to Output Token (ns)                : 10923566089
Mean Time to Output Token (ns)               : 9133078206
50.00 percentile time to output token (ns)   : 8970371559
90.00 percentile time to output token (ns)   : 10923566089
95.00 percentile time to output token (ns)   : 10923566089
97.00 percentile time to output token (ns)   : 10923566089
99.00 percentile time to output token (ns)   : 10923566089
99.90 percentile time to output token (ns)   : 10923566089

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
ttft_latency (ns): 100000000
tpot_latency (ns): 100000000
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 300000
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 1

No warnings encountered during test.

No errors encountered during test.

@freedomtan

@freedomtan
Copy link
Contributor

@anhappdev and @freedomtan to check if the number of samples is one.

@farook-edev please upgrade the version of loadgen first.

@freedomtan
Copy link
Contributor

mlperf_client testing method.

@Mostelk different input/output sizes appear to be to much of mobile environment, so let's have a simple configuration.

@anhappdev anhappdev changed the base branch from master to submission-v6.0 November 4, 2025 07:27
@anhappdev anhappdev merged commit bd890dd into submission-v6.0 Nov 4, 2025
30 checks passed
@anhappdev anhappdev deleted the feat-llm branch November 4, 2025 07:28
@github-actions github-actions bot locked and limited conversation to collaborators Nov 4, 2025
@anhappdev
Copy link
Collaborator

Anyway, could we upgrade TensorFlow? Which version should we use? @anhappdev

Regarding TensorFlow version. If possible we should use the latest available v2.20.0 because it work with Bazel 7, which will resolve the issue with iOS build on macOS 26. TensorFlow v2.19 and earlier still use Bazel 6.

@anhappdev
Copy link
Collaborator

@farook-edev Could you build the Android app on macOS? I'm getting this error:

ERROR: /Users/anh/dev/mlcommons/mobile_app_open/mobile_back_tflite/cpp/backend_tflite/BUILD:110:18: Linking mobile_back_tflite/cpp/backend_tflite/libtflitebackend.so failed: (Exit 1): clang failed: error executing command (from target //mobile_back_tflite/cpp/backend_tflite:libtflitebackend.so) 
  (cd /private/var/tmp/_bazel_anh/30b0ae0ebfc82d789f6eeabcb52f979d/execroot/mlperf_app && \
  exec env - \
    PATH='/Users/anh/Library/Caches/bazelisk/downloads/bazelbuild/bazel-6.3.2-darwin-arm64/bin:/Users/anh/sdk/venv/venv_p39/bin:/opt/homebrew/opt/node@18/bin:/Users/anh/cache/pub-cache/bin:/Users/anh/sdk/Flutter/flutter_3.19.6/bin:/Users/anh/sdk/Flutter/flutter_3.19.6/bin/cache/dart-sdk/bin:/Users/anh/sdk/Android/sdk/platform-tools:/Users/anh/sdk/Java/openjdk-17.0.1/Contents/Home/bin:/opt/homebrew/opt/ruby/bin:/opt/homebrew/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/Library/TeX/texbin:/Users/anh/sdk/venv/venv_p39/bin:/opt/homebrew/opt/node@18/bin:/Users/anh/cache/pub-cache/bin:/Users/anh/sdk/Flutter/flutter_3.19.6/bin:/Users/anh/sdk/Flutter/flutter_3.19.6/bin/cache/dart-sdk/bin:/Users/anh/sdk/Android/sdk/platform-tools:/Users/anh/sdk/Java/openjdk-17.0.1/Contents/Home/bin:/opt/homebrew/opt/ruby/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/Users/anh/Library/Application Support/JetBrains/Toolbox/scripts:/Users/anh/Library/Application Support/JetBrains/Toolbox/scripts' \
    PWD=/proc/self/cwd \
  external/androidndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang @bazel-out/arm64-v8a-opt/bin/mobile_back_tflite/cpp/backend_tflite/libtflitebackend.so-2.params)
# Configuration: 4a3d5e0ee256349d5cd18cd0b151ba6eaa79e5b36b30cd73291980202fce22d3
# Execution platform: @local_execution_config_platform//:platform
ld.lld: error: unknown argument '-framework'
ld.lld: error: cannot open CoreFoundation: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The iOS build works fine on macOS, though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Master issue: LLM Benchmark

6 participants